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Abstract 

Linear processes on functional spaces were born about fifteen years ago. 
And this original topic went through the same fast development as the other 
areas of functional data modeling such as PCA or regression. They aim at 
generalizing to random curves the classical ARMA models widely known 
in time series analysis. They offer a wide spectrum of models suited to 
the statistical inference on continuous time stochastic processes within the 
paradigm of functional data. Essentially designed to improve the quality 
and the range of prediction, they give birth to challenging theoretical and 
applied problems. We propose here a state of the art which emphasizes 
recent advances and we present some promising perspectives based on our 
experience in this area. 

1 Introduction 

The aim of this chapter is double. First of all we want to provide the reader 
with basic theory and application of linear processes for functional data. The 
second goal consists for us in giving a state of the art which complements 
the monograph by Bosq (2000). Many crucial theorems were given in this 
latter book to which we will frequently refer. Consequently, even if our work 
is self-contained we pay special attention to recent results, published from 
2000 to 2008, and try to draw the lines of future and promising research in 
this area. 

It is worth recalling now the approach that leads to modelizing and infer- 
ring from curves-data. We start from a continuous time stochastic process 
{^t)t>o- The paths of ^ are cut into equally spaced pieces of trajectories. 
Each of these piece is then viewed as a random curve. With mathematical 
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symbols we set: 

Xk {t) = CkT+u < i < r 

where T is fixed. The function Xk{-) maps [0,r] to M and is random. Ob- 
serving over [0, nT] produces a n sample Xi, Obviously the choice 
of T is crucial and is usually left to the practitioner and may be linked with 
seasonality (with period T). Dependence along the paths of ^ will create 
dependence between the Xj's. But this approach is not restricted to the 
whole path of stochastic process. One could as well imagine to model whole 
curves observed at discrete intervals: the interest rate curves at day k Ik (S) 
is for instance a function linking duration S (as an input) and the asso- 
ciated interest rates (as outputs). Observing these curves, whose random 
variations will depend on financial markets, along n days produces a sample 
similar in nature to the one described above, although there is no underlying 
continuous-time process in this situation, rather a surface {k,6,Ik{5)). We 
refer for instance to Kargin and Onatski (2008) for an illustration. 

Statistical models will then be proposed and mimic or adapt the scalar or 
finite-dimensional approaches for time-series (see Brockwell, Davis (1987)). 
Each of these (random or not) functions will be viewed as a vector in a 
vector space of functions. This paradigm has been adopted for a long time 
in probability theory as will be seen through, for instance, Lcdoux and Ta- 
lagrand (1991) and references therein. But the first book entirely dedicated 
to the formal and applied aspects of statistical inference in this setting is 
certainly due to Ramsay and Silverman (1997), followed by Bosq (2000), 
Ramsay and Silverman (2002) again then Ferraty and Vieu (2006). 

In the sequel we will consider centered processes with values in a Hilbert 
space of functions denoted H with inner product (•, •) and norm ||-||. The 
Banach setting though more general has several drawbacks. Some references 
will be given yet throughout this section. The reason for privileging Hilbert 
spaces are both theoretic and practical. First many fundamental asymptotic 
theorems are stated under simple assumptions in this setting. The central 
limit theorem is a good example. Considering random variables with values 
in C ([0, 1]) or in Holder spaces for instance lead to very specific assumptions 
to get the CLT and computations are often uneasy whereas in a Hilbert 
space moment conditions are usually both necessary and sufficient. The 
nice geometric features of Hilbert space allow us to consider denumerable 
bases, projections, etc in a framework that generalizes the euclidean space 
with few drawbacks. Besides, in practice, recovering curves from discretized 
observations by interpolation or smoothing techniques such as splines or 
wavalets yields functions in the Sobolev spaces, say W^'^ (here m is an order 
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of differentiation connected with the desired smoothness of the output), are 
all Hilbert spaces. We refer to Ziemer (1989) or to Adams, Fournier (2003) 

for monographs on Sobolev spaces. 

In statistical models, unknown parameters will be functions or linear 
operators (the counterpart of matrices of the euclidean space), the latter 
being of utter interest. We give now some basic facts about operators which 
will be of great use in the sequel. 

Several monographs are dedicated to operator theory, which is a major 
theme within the mathematical science. Classical references are Dunford, 
Schwartz (1988) and Gohberg, Goldberg and Kaashoek (1991). The adjoint 
of the operator T is classically denoted T* . The Banach space of compact 
operators C on a Hilbert space H is separable when endowed with the clas- 
sical operator norm \\-\\^'- 

\\T\\oo = sup ||Tx|| 
xeBi 

where Bi denotes the unit ball of the Hilbert space H. The space C contains 
the set of Hilbert-Schmidt operators which is a Hilbert space and denoted 
S. Let T and S belong to S the inner product between T and S and the 
norm of T are respectively defined by: 

(5,r)5 = ^(5ep,rep), 

p 

lrt = ^||Te^f 

p 

where {&p)p^f^ is a complete orthonormal system in H. The inner prod- 
uct and the norm defined just above do not depend on the choice of the 
c.o.n.s.(ep)pgj^. The nuclear (or trace-class) operators are another impor- 
tant family of operators for which the series: 

llTepll < -l-oo. 

p 

It is plain that a trace class operator is Hilbert-Schmidt as well. Many of 
the asymptotic result mentioned from now on and involving random oper- 
ators are usually obtained for the Hilbert-Schmidt norm, unless explicitly 
mentioned. It should be noted as well that this norm is thinner than the 
usual operator norm. 

The next section is devoted to general linear processes. Then we will 
focus on the autoregressive model and its recent advances, which will be 
developed in the third section. We will conclude with some issues for future 
work. 
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2 General linear processes 



The linear processes on function spaces generalize the classical scalar or 
vector linear processes to random elements which are curves or functions 
and more generally valued in an infinite-dimensional separable Hilbert space 
H. 

Definition 1 Let (efc)^gf^ he a sequence of i.i.d. centered random elements 
in H and let (afc)fc£pj be a sequence of bounded linear operators from H to 
H such that a^ = I and fi £ H be a fixed vector. If 

+00 

Xn = l^ + '^ajien-j) , (1) 

j=0 

{Xn)j^^^ is a linear process on H (denoted in the sequel H -linear process) 
with mean ^. 

Unless explicitly mentioned the mean function will always be assumed 
to be null (and the process X is centered). Its seems that, after a collection 
of paper dating back to the end of the 90's-early OO's the model creates 
less inspiration in the community. We guess that the recent works by Bosq 
(2007) and the book by Bosq and Blanke (2007) may bring some fresh ideas. 
We state here some basic facts: invertibility and convergence of estimated 
moments. 

2.1 Invertibility 

When the sequence e is a strong ff-white noise, that is a sequence of i.i.d. 

1 1 2 

random elements such that ||e|| < +00 and whenever 

+00 

5]||a,||^ <+oo (2) 
i=o 

the series defining the process {Xn)^^^ through ([T]) converges in square 
norm and almost surely through the — 1 law. The strict stationarity of X^ 
is ensured as well. The problem of invertibility is addressed in Merlevede 
(1995). 

Theorem 1 // (X„)^gpj is a linear process with values in H defined by ([7]) 
and such that : 

+00 
i=i 
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then {Xn)n^^ is invertible: 

+00 

= + Pj (Xn-j ) 

i=i 

where all the pj 's are bounded linear operators in H with X^j"^ 1 1 Pi 1 1 00 ^ 
and the series converges in mean square and almost surely. 

Remark 1 We deduce from the latter that e„ is the innovation of the process 
X and that (OP coincides with the Wold decomposition of X. 

We give now some convergence theorems for the mean and the covariance 
of Hilbert- valued hnear processes. These results are not completely new but 
essential. 

2.2 Asymptotics 

It is worth mentioning a general scheme for proving asymptotic results for 
linear processes. If several approaches are possible, it turns out that, up to 
the authors' opinion, one of the most fruitful relies on approximating the 
process X^ by truncated versions like: 

m 
j=0 

where m £ N. The sequence Xn,m is for fixed m, blockwise independent: 
Xn+m+i,m is indeed stochastically independent from Xn^m if the e^-'s are. 
The outline of the proofs usually consists in proving asymptotic results for 
the m-dependent sequence Xn^m, then to let m tend to infinity with an 
accurate control of the residual X„ — Xn^m = Sj'=^+i i^n-j) ■ 

2.2.1 Mean 

Asymptotic results for the mean of a linear process may be found in Mer- 
levede (1996) and Merlevede, Peligrad and Utev (1997). Even if the first 
is in a way more general, we focus here on the second article since it deals 
directly with the mean of the non-causal process indexed by Z : 

+00 

Xk = ^ aj {ek-j) . 

j=-oo 

The authors obtain sharp conditions for the CLT of Sn = J22=i -^k- 
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Theorem 2 Let {aj)j^j^ he a sequence of operators such that: 

+00 

j=-oo 

Then 




N{Q,ATeA*) 



where N {Q,ATf.A*) is the H -valued centered gaussian random element with 
covariance operator AV^A* where T^ = E (eo <8) eo) is the covariance operator 
of eo and A = X)j"=^oo % • 

Remind that if u and v are two vectors in H then notation u®v stands for 
the rank-one linear operator from H to H defined by: {u v) (x) = {v, x) u. 

This result is extended with additional assumptions to the case of strongly 
mixing e^'s. Note that the problem of weak convergence for the mean of sta- 
tionary Hilbertian process under mixing conditions had been addressed in 
the early 80's by Maltsev and Ostrovski (1982). A standard equi-integrability 
argument and classical techniques provide the following rates of convergence 
for Sn- Nazarova (2000) proved the same sort of theorem when X is a lin- 
ear random field with values in a Hilbert space. Now we turn to the rate 
of convergence of the empirical mean in quadratic mean and almost surely. 
The following theorem may be found in Bosq (2000). 

Proposition 1 Let Xk = J2k=o'^j i^k-j) o-f^d Sn = ^k=i-^k then 

2 +00 

^ E{Xo,Xk), 
fe=— 00 



(logn)^/2+' 

for all e > 0. 

We turn to covariance operators now. 

2.2.2 Covariance operators 

The situation is slightly more complicated than for the mean due to the 
tensor product. 



uE 



n 



Sn 

n 



a.s. 
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Definition 2 The theoretical covariance operator at lag h eN of a process 
X is defined by: 

Th = E {Xh Xo) . 

The linear operator Th is nuclear on H when the second order strong mo- 
ments of the X is convergent. Its empirical counterpart based on the sample 
is: 



n 

n — ' 



The covariance operator of the process, Tq = T is selfadjoint, positive and 
nuclear hence Hilbert- Schmidt and compact. 

It should be noted that Th is not in general a symmetric operator con- 
versely to the classical covariance operator Tq. The weak convergence of 
covariance operators for if-lincar processes was addressed by Mas (2002). 
It is assumed that: 



E||eo||^ < +00 



+00 



then the vector of the h covariance operators up to any fixed lag h is asymp- 
totically gaussian in the Hilbert-Schmidt norm. 

Theorem 3 Let us consider the following linear and Hilbert space valued 
process 

+0O 

Xt= ^ aj {et-j) 

j=-oo 

then 



Tna — Ti 



Voo 



where Gr = (^G^\ ...,G^^^ is a Gaussian centered random element with 
values in S^~^^. Its covariance operator is Gr = (q^y- "^^] which is a 

\ ^0<p,q<h 

nuclear operator in 5'*"'"^ defined blockwise for all T in S by 

qM (j.) ^ J2 Th+p-gTTh + J2 ^h+gTTh-p + AgiA- $) Ap (T) (4) 
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where A, $ and Ap are linear operators from S to S respectively defined by 

A (T) = E ((eo eo) (eo eo)) (T) 
(T) = C (r + T*) C7 + {C^C) (T) 

and (T) = ai+pTa*. 

As by-products, weak convergence results for the eigenelements of r„^o ~ 
Fq, that is for the PCA of the stationary process X, are derived. The reader 
interested with these developments should refer to the paper by Mas and 
Menneteau (2003a) which proposes a general method to derive asymptotics 
for the eigenvalues and eigenvectors of r„^o (the by-products of the functional 
PCA) from the covariance sequence itself. Perturbation theory is the main 
tool through a modified delta-method. 

2.3 Perspectives and trends: towards generalized linear pro- 
cesses ? 

It turns out that the literature based on inference methods for general linear 
processes is rather meager. Obviously estimating simultaneously many a^-'s 
seems to be intricate and not necessarily needed as the functional AR pro- 
cess, which will be enlightened in the next section, is quite successful and 
easier to handle. However general linear processes are the starting point for 
very interesting theoretical problems where dependence plays a key-role. We 
mention at last the abstract papers by Merlevede and Dedecker (2003) es- 
pecially section 2.4 dedicated to proving a conditional central limit theorem 
for linear processes under mild assumptions and to Merlevede and Dedecker 
(2007) whose section 3 deals with rates in the law of large numbers. These 
works may provide theoretical material to go further into the asymptotic 
study of these processes. 

In a very recent article, Bosq (2007) introduces the notion of linear pro- 
cess in the wide sense. The definition remains essentially the same as in 
display ([I]) but the operators (aj)jgN then be unbounded; which finally 
generalizes the notion. A key role is played by linear closed spaces (LCS) 
introduced by Fortet. A LCS ^ is a subspace of -the space of random 
variables with values in H and finite strong second moment- such that: 

(i) Q is closed in H. 

(ii) \i X & Q, I {X) G Q for all bounded linear operator I. 
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This theory -involving projection on LCS, weak and strong orthogo- 
naUty, dominance of operators- allows Bosq to revisit and extend the no- 
tions of linear process, Wold decomposition, Markovian process when the 
bounded {aj)j^^ may be replaced with measurable mappings {lj)j^f^- Sev- 
eral examples are given : derivatives of functional processes like in the MAH 
Xn = Cn + ce^, arrays of linear processes, truncated Ornstein-Uhlenbeck pro- 
cess... The personnal communication Bosq (2009) discusses these extensions 
to tensor products of linear processes and will certainly shed a new light at 
their covariance structure. We also refer to chapters 10 and 11 in the book 
by Bosq and Blanke (2007) for an exposition of these concepts. 

3 Autoregressive processes 
3.1 Introduction 

The model generalizes the classical AR(1) for scalar or multivariate time se- 
ries to functional data and was introduced for the first time in Bosq (1991). 
Let Xi, . . . , Xn be a sample of random curves for which a stochastic depen- 
dence is suspected (for instance the curve of temperature observed during n 
days at a given place). We assume that all the Xj's are valued in a Hilbert 
space H and set: 

Xn = p{Xn-l)+en (5) 

where p is a linear operator from H to H and (e„)„gj^ is a sequence of H 
valued centered random elements usually with common covariance opera- 
tor. The model is simple, with a single unknown operator, leaving however 
the possibility to decline various assumptions either on the operator p (lin- 
ear, compact, Hilbert-Schmidt, symmetric or not, etc) or on the dependence 
between the e„'s. The latter are quite often independent and identically dis- 
tributed but alternatives are possible (mixing or more naturally martingale 
differences). Bosq (2000) proved that assumption ([2]) comes down actually 
to the existence of a > 0, 6 G [0, l[such that for all p G N : 

ll/lloo<«&" 

which ensures that ([5]) admits a unique stationary solution. The process 
(X„)^gj^ is Markov as soon as E (e„|X„_i, . . . , Xi) = 0. As often noted the 
interest of the model relies in its predictive power. The estimation of p is 
usually the first and necessary step before deriving the statistical predictor 
given the new input Xn+i- p{Xn)- The prediction are often compared with 
ARMA model or with non-parametric smoothing techniques. The global 
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treatment of the trajectory as a function often ensures better long-run pre- 
diction but at the expense of more tedious numerical procedures. 



3.1.1 Representation of stochastic processes by functional AR 

Various real valued processes allow the ARH representation. We plotted 
on Figure [T] graphs of two simulated processes, the Ornstein-Uhlenbeck (O- 
U) process and the Wong process. The 0-U process {r]t,t G i?) is a real 
stationary Gaussian process : 



Vt 



J —oo 



where {wt)t^R is a bilateral standard Wiener process and a a positive con- 
stant. Bosq (1996) gives the ARH representation Xn = p{Xn-i) + with 
values in := L^[0, 1] where A„(t) = r]n+t,t G [0,1], n G Z and p is a 
degenerated linear operator 

p{x){t) = e-''^x{l),t G [0, 1], X G 

and 

rn+t 

en{t) = / e-'^("+*-")(iu'„t G [0,l],n G Z. 

J n 

The Wong process is a mean-square differentiable stationary Gaussian pro- 
cess which is zero-mean and is defined for t £ R by: 

rcxp(2t/v^) 

= v3 exp I — v3t / Wudu. 







Cutting R in intervals of length 1 and defining Xn{t) = ^n+t for t G [0, 1], 
Mas and Pumo (2007) obtain ARH representation, A„ = A{Xn-i) + of 
this process with values in Sobolev space W := VF^'^ = G L?',u' G L^} 
and 



exp 



en(t) =\/3exp[-\/3(n-l + t)] j 



[2{n-l+t)/v^] 



exp 



[2(n-l)/v^] 



du 



exp[2(n-l)/v^] 



for t G [0, 1]. The linear and degenerated operator A is given hy (j) + ^{D) 
where D is the ordinary differential operator and 

mm = [expi-vst) + vscmii), [^{D){mt) = c(t)/'(i). 
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with c{t) = ^ • exp(- VSt) • {exp(2t/V3) - 1}. 

Other examples are given in the paper of Bosq (1996) or in the classical 
book of Bosq (2000). 

A major issue would be to infer deeper links between autoregressive 
functional processes and diffusion processes or stochastic differential equa- 
tions with more general autocorrelation operators. But this remains an open 
question and the work by Ramsay (2000) about this topic should certainly 
deserve more attention to be extended. 



3.1.2 Asymptotics for the mean and covariance 

Obviously all the results obtained for general linear processes hold for the 
ARH(l): namely for the mean and the covariance operator. Some new re- 
sults are stated below -they are new essentially with respect to Bosq (2000)- 
and are related with moderate deviations (Mas and Menneteau (2003b)) 
and laws of the iterated logarithm (Menneteau (2003)). Let r/ be a square 
integrable real valued random variable. We need to introduce the following 
notations {Ix and Jx are functions from H to H, Jr is a function from S 
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to S and Kr is a subset of 5): 



Ul = p (Xq) ei + ei (g) p (Xq) + ei ei - T^, 
Ix {x) = sup {{h, X — p (x)) — K exp {h, ei)} , 



(6) 



Jx (x) = i inf [Et]^ ■.x = e[7]{Ih- pT^ (ei)] } , 



Jr (s) = - inf [Erl" ■.s = E[r,{Is- R)-' (ni)] } , 



Kr = {e [r? {Is - R)-' (ni)] : v G (P) , Et^ < l} 



and i? is a linear operator from S to S defined by R {s) = psp* . 

We refer to Dembo and Zeitouni (1993) for an exposition on large and 
moderate deviations. 

Theorem 4 The empirical mean of the ARH(l) process follows the large 
deviation principle in H with speed and rate function Ix o-nd the mod- 
erate deviation principle with rate function Jx ■ 

The covariance sequence of the ARH(l), r„ — F, follows the moderate devi- 
ation principle in the space of Hilbert- Schmidt operators with rate function 
Jr and the law of the iterated logarithm with limit set Kr ■ 

The first results obtained on the covariance sequence are in Bosq (1991) 
but we mention here the interesting decomposition given in Bosq (1999). 

Proposition 2 Let Xn he an ARH(l) such that E\\Xq\\'^ < +oo, then the 
tensorized process 



where R (5) = pSp* and ui was defined at The sequence Ui is a mar- 
tingale difference with respect to the filtration a (e^, ei_i, ...) . 

3.2 Two issues related to the general estimation problem 
3.2.1 Identifiability 

The moment method provides the following normal equation: 



Z, = X,(^ Xi 



- r 



is an autoregressive process with values in S such that: 



Zi = R (Zi^i) + Ui 



A = pT 



(7) 
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where 



r = E(Xi®Xi), 

A = E(X2®Xi) 

are the covariance operator (resp. the cross covariance operator of order 
one) of the process (Xn)^^^. 

The first step consists in checking that the Yule- Walker equation ([7]) 
correctly defines the unknown parameter p. 

Proposition 3 When the inference on p is based on the moment equation 
identifiability holds i/kerF = {0}. 

The proof of this proposition is plain since taking p = p + u®v where v 
belongs to the kernel of T, whenever this set is non-empty we see that 

pT = pT + u®Tv = pV 

hence that ([7]) holds for p ^ p. 

Consequently the injectivity of F is a basic assumption which can hardly 
be removed and which entails that the eigenvalues of T are infinite, strictly 
positive. These eigenvalues will be denoted (Aj)jgpj where one assumes once 
and for all that the Aj's are arranged in a decreasing order with Yli&i^i 
finite. The corresponding eigenvectors (resp. eigenprojectors) will be de- 
noted (ei)jgj^ (resp. (7rj)jgp^ where vTj = (gi ei). Heuristically we should 
expect with ([7j) at hand that exists to estimate p and this inverse will 
not be defined if T is not one to one. 

3.2.2 The inverse problem 

Even if the identifiability is ensured estimating /) is a difficult task due to 
an underlying inverse problem which stems from display ([7]). The notion of 
inverse (or ill-posed) problem is classical in mathematical analysis (see for 
instance Tikhonov, Arsenin (1977) or Groetsch (1993)). In our framework 
it could be explained by claiming that equation ([7]) will imply that any 
attempt to estimate p will result in a highly unstable estimate. This comes 
down with simple words, which will be developed below, from the inversion 
of r. A canonical example of an inverse problem is the numerical inversion 
of an ill-conditioned matrix (that is a matrix with eigenvalues close to zero) . 

The first stumbling stone comes from the fact that we cannot deduce 
from ([7]) that AF"^ = p. We know that a sufficient condition for to 
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be defined as a linear mapping is: kerF = {0}. Then T^^ is an unbounded 
symmetric operator on H. Some consequences are collected in the next 
proposition: 

Proposition 4 When T is injective may be defined. It is a linear 
measurable mapping defined on a dense domain in H, denoted V (r^^) and 
defined by: 



V (r^^) = Imr = < 



+00 +00 ^2 

p=i p=i P 



This domain is dense in H . It is not an open set for the norm topology of 
H . The operator is unbounded which means that it is continuous at no point 
ofT> (r~^) . Besides T^^T = Ih but VT~^ = /^'(r-i) '^^^ FF"^, which is not 
defined on the whole H , may be continuously extended to H . 

For similar reasons ([7]) implies AF~^ = p\i^Y 7^ P ^-iid AF^^ may be 
continuously and formally extended to the whole H. In fact F hence F~^ are 
unknown. However would F be totally accessible we should find a way to 
regularize the odd mathematical object that is F~^. Within the literature on 
inverse problems (see for instance Groetsch (1993)) one often replaces F~^ 
by a linear operator "close" to it but endowed with additional regularity 
(continuity/boundedness) properties, say F^. The Moore-Penrose pseudo 
inverse is an example of such an operator but many other techniques exist. 
Indeed starting from 

/6N '■ 

for all X in D (r~^) one may set for instance: 

(^) = E (8) 



A/ + a 

= E T2^^' (^) (10) 

where kn is an increasing and unbounded sequence of integers and a 
sequence of positive real numbers decreasing to 0. The three operators in 
the display above are indexed by n, are all bounded with increasing norm 
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and are known as the spectral cut-off, penalized and Tikhonov regularized 
inverses of T. They share the following pointwise convergence property: 

for all X in P (r"^). 

In practice if r„ is a convergent estimator of F the regularizing meth- 
ods introduced below can be applied to r„ which is usually not invertible 
(see below for an example). It should be noted at this point that the reg- 
ularization for the inverse of the covariance operator appears in the linear 
regression model for functional variable: 

y = {X,ip) + e 

when estimating the unknown (p (see Cardot et al. (2007)). 

At last a general scheme to estimate p may be proposed with estimates 
of r and A at hand say and A„: compute Tn and take for the estimate 
and the predictor based on the new input Xn+i respectively: 

Pn = A„rjj and Pn (^n+l) ■ 

Obviously examples of such estimates are the empirical covariance and cross- 
covariance operators 

1 

r„ = — > (g) Xk, 

n 

k=l 

^ n— 1 

A„ = V Xk+1 Xk 

n — 1 ^-^ 

k=l 

where the X^'s were reconstructed by interpolation techniques. 
For instance the spectral cut-off version for F^j is 

where the eigenvalues A; and the eigenprojectors tt; are by-products of the 
functional PC A of the sample Xi, 

Remark 2 This inverse problem is the main serious abstract concern when 
infering on the ARH model. The considerations above are moreless exposed 
in all the articles dealing with it and we guess it will be of some interest to 
expose and sum up this issue and some of its solutions in this monograph. 
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3.3 Convergence results for the autocorrelation operator and 
the predictor 

As the data are of functional nature, the inference on p cannot be based 
on likelihood. Lebesgue's measure cannot be defined on infinite-dimensional 
spaces. However it must be mentioned that Mourid and Bensmain (2006) 
propose to adapt Grenander's theory of sieves (Grenander (1981) and Ge- 
man and Hwang (1982)) to this issue. They prove consistency in two very 
important cases: when pis a. kernel operator and when p is Hilbert-Schmidt. 
In the former case p is identified with the associated kernel K, developed on 
a basis of trigonometric functions along the sieve: 

{m m 
K e : K{t) = co + ^ V2ck cos {2iTkt) , t G [0, 1] , ^ k'^cl < m 
k=l k=l 

This approach is truly original within the literature on functional data 
and could certainly be extended to other problems of linear or non linear 
regression. 

The seminal paper dealing with the estimation of the operator p dates 
back to 1991 and is due to Bosq (1991). Several consistency results are car- 
ried out immediately relayed by Pumo's (1992) and Mourid's (1995) PhD 
thesis. Then Pumo (1998) focus on random functions with values in C ([0, 1]) 
with specific techniques. Besse and Cardot (1996), then Besse, Cardot and 
Stephenson (2000) implement spline and kernel methodology with applica- 
tion to climatic variations. Amongst several interesting ideas they introduce 
a local covariance estimate: 

f ^ ELi [Xi'S)Xi]K{\\Xi-Xj/h) 
^-lK{\\X,-X4/h) 

and a local cross-covariance estimate which emphasize data close to the last 
observation. This method make it possible to consider data with departures 
from the stationarity assumption. This issue of the estimation of p is also 
treated in Guillas (2001) and Mas (2004). 

A recent paper by Antoniadis and Sapatinas (2003) carry out wavelet 
estimation and prediction in the ARH(l) model. The inverse problem is 
underlined through a class of estimates stemming from the deterministic 
literature on this topic. This class of estimates is compatible with wavelet 
techniques and lead to consistency of the predictor. The method is applied 
on the " El Nino" dataset which tends to become a benchmark for comparing 
the performances of the predictions. 
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Ruiz-Medina et al (2007) consider the functional principal oscillation pat- 
tern (POP) decomposition of the operator p as an alternative to functional 
PCA decomposition. They implement a Kalman filter to the state-space 
equation obtained at the preceding step and derive the optimal predictor. 
This original approach, illustrated by some simulations, seems to be suited 
to spatial functional data as well. 

Kargin and Onatski (2008) introduce the notion of predictive factor 
which seems to be better suited than the PCA basis to project the data 
if one really focuses on the predictor (and not on the operator itself). A 
double rcgularization (penalization and projection) provides them with a 
rate of O {n^'^^^ \og^ n) (where /3 > 0) for the prediction mean square error. 

In Mas (2007) the problem of weak convergence is addressed. The main 
results are given in the Theorem below: 

Theorem 5 It is impossible for pn — p to converge in distribution for the 
classical norm topology of operators. But under moment assumptions, if 
\\T~^/^p\\ < +00 and if the spectrum of T is convex then when kn = 



where Q is a H-valued gaussian centered random variable with covariance 
operator T^. and Ilk„ is the projector on the kn first eigenvectors of r„ . 

Remark 3 The first sentence of the Theorem above is quite surprising but is 
a direct consequence of the underlying inverse problem. Finally considering 
the predictor weakens the topology and has a smoothing effect on pn- This 
phenomenon -which was exploited in Antoniadis, Sapatinas (2003)- appears 
as well in the linear regression model for functional data (see Cardot, Mas, 
Sarda (2007)). 

It should be noted that rates of convergence are difficult to obtain (see 
Guillas (2001) or Kargin and Onatski (2008), Theorem 3) and rather slow 
with respect to those obtained in the regression model. An exponential 
inequality appears at Theorem 8.8 in Bosq (2000) but it seems that a more 
systematic study of the mean square prediction error has not been carried 
out yet and that optimal bounds are not available. 
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3.3.1 Hypothesis testing 

A very recent article by Horvath, Huskova and Kokoszka (2009) focuses 
on the stability of the autocorrelation operator against change-point alter- 
natives. In fact the model (0) based on the sample is slightly 
modified to: 

= Pn {Xn-l) + Cji 

and the authors test 

Hq: pi = ... = Pn 

against the alternative: 

Ha ■■ there exists k* G {1, ...,n} : pi = ... = pk* / Pk*+i = ■■ = Pn- 

The test is based on the projection of the process Xn on the p first eigenvec- 
tors of the functional PCA and on an accurate approximation of the long-run 
covariance matrix. The asymptotic distribution is derived by means of em- 
pirical process techniques. The consistency of the test is obtained and a 
simulation/real case study dealing with credit card transaction time series 
is treated. 

It turns out that Laukaitis and Rackauskas (2002) considered the same 
sort of problem a few years sooner. They introduce a functional version of 
the partial sum process of estimated residuals: 

S{t)=Y,[Xk-p{Xk-i)] 

k=2 

and obtain weak convergence results for its normalized version to an H- 
valued Wiener process. This formal theorem yields different strategies (dyadic 
increment of partial sums or moving residual sums) to derive a test. 

It seems however that the topic of hypothesis testing was rarely addressed 
yet quite promising even if serious theoretic and technical problems appear, 
once again in connection with the inverse problem mentioned earlier in this 
article. 

3.4 Extension of ARH model 

Various extensions have been proposed for ARH(l) model in order to im- 
prove the prediction performance of ARH(l) model. The first one is the 
natural extension autoregressive process of order p with p > 1, denoted 
ARH(p), defined by 

Xn = PlXn~l + . . . + PpXn 
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Using the Markov representation = 



where 



P\ P2 

I 



Pn 






, and 



e. 







P = 







1 








and / denotes the identity operator, Mourid (2003) obtain asymptotic results 
of projector estimators and predictors. 

Damon and Guillas (2002) introduced autorcgrcssivc Hilbcrtian process 
with exogenous variables model, denoted ARHX(l), which intends to take 
into account the dependence structure of random curves under the influence 
of explanatory variables. The model is defined by the equation 



where ai, • • • , are bounded linear operators in H and Z^^i, • • • , Zn^q are 
ARH(l) exogenous variables; they suppose that the noises of the g + 1 H— 
valued autoregressive processes are independent. They obtain some limit 
theorems, derive consistent estimators, present a simulation study in order 
to illustrate the accuracy of the estimation and compare the forecasts with 
other functional models. 

Guillas (2002) consider a H-valued autoregressive stochastic sequence 
{Xn) with several regimes such that the underlying process (/„) is station- 
ary. Under some dependence assumptions on (/„) he proves the existence 
of a unique stationary solution and state a law of large numbers and the 
consistency of the covariance estimator. Following the same idea in a recent 
work Mourid (2004) introduces and studies the autoregressive process with 
random operators Xn = pnXn-i + Cn where {pn,n £ Z) is stationary and 
independent of (e„). Results similar to classical ARH{1) are obtained. 

A new model, denoted ARHD process, considering the derivative curves 
of an ARH(l) model was introduced by Marion and Pumo (2004). In a 
recent paper Mas and Pumo (2007) introduced and study a slightly new 
model: 



where Xn are random function with values in the Sobolev space W"^'^ = 
{u G L'^[0, 1],u' G L'^[0, 1]}, ^ is a compact operator from W to W, is a 
compact operator from L^fO, 1] to W'^'^ and ||^/t + */t'|| < \\h\\ for h G W'^''^. 
Convergent estimates are obtained through an original double penalization 
method. Simulations on real data show that predictions are comparable to 



Xn = p{Xn-l) + ai{Zn,l) + ... aq{Zn,q) + en,n e Z 



Xn = cPXn-l + ^'(X-l) + e 



■n 
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those obtained by other classical methods based on ARH(l) modelization. 
Tests on the derivative part and models with higher derivatives may be 
interesting from both theoretical and practical point of view. 

3.5 Numerical aspects 

We present in this section some numerical aspects concerning the prediction 
when data are curves observed at discrete points. To our knowledge the 
prediction methods based on linear processes are limited to application of 
ARH(l) model since tractable algorithms using general linear processes in 
Hilbert spaces do not exist (see Merlevede (1997)). However some partial 
results are available for moving average processes in Hilbert spaces which 
will be briefly discussed in the next section. 

The literature using ARH(l) model to make prediction is various and 
rich and concern different domains: 

• Environment: Besse et al. (2000); Antoniadis and Sapatinas (2003); 
Mas and Pumo (2007); Fernandez de Castro et al. (2005); Damon et 
Guillas (2002); 

• Economy and finance: Kargin and Onatski (2008); 

• Electricity consumption: Cavallini et al. (1994); 

• Medical sciences: Marion and Pumo (2004); Glendinning and Fleet 
(2007) 

From a technical point of view the different approaches for implement- 
ing an ARH proceed in two steps. The first step consists in decomposing 
data in some functional basis in order to reconstruct them on the whole 
observed interval. Most of the methods use spline or wavelet basis and sup- 
pose that curves belong to the Sobolev W^^'*^ space of functions such that 
the A;-th derivative is squared integrable. We invite the reader to refer to the 
papers by Besse and Cardot (1996), Pumo (1998) and Antoniadis and Sap- 
atinas (2003) among others for detailed discussions about the use of splines 
and wavelets for numerical estimation and prediction using ARH(l) model 
and for the numerical results presented hereafter. The second step con- 
sists in choosing tuning parameters required by these methods, for example 
the dimension of the projection subspace for the projection estimators. A 
general method used by the precedent authors is based on cross-validation 
approach which gives satisfactory results in applications. Note at last that 



20 



Nino-3 time series, observations until 1986 




1950 1960 1970 1980 



Figure 2: Monthly mean El Nino sea surface temperature index from Jan- 
uary 1950 to December 1986 



alternatives approaches of prediction based on ARH(l) modelization are pro- 
posed by Mokhtari and Mourid (2002) and Mourid and Bensmain (2005). In 
Mokhtari and Mourid (2002) the authors use a Parzen approximation on re- 
producing kernel spaces framework. Some simulation studies are presented 
in the recent paper published in 2008 by the same authors. 

In order to compare methods described above we consider a climatologi- 
cal time series describing the El Niho-Southern Oscillation (see. for example 
Besse et al. (2000) or Smith et al. (1996) for a description of the datsQ). The 
series gives the monthly mean El Nino sea surface temperature index from 
January 1950 to December 1986 and is presented in figure [2j We compare 
the ARHD predictor with various functional prediction methods. 

We compare the predictors of month temperature during 1986 knowing 
the data until 1985 by two-criteria: mean-squared error (MSE) and relative 
mean-absolute error (RMAE) defined by: 

MSE = (X'- X'] ,RMAE = —y^^ — 

12 ^ V " "7 ' 12 ^ X' 

i=i i=i " 



^Data is freely available from http://www.cpc.ncep.noaa.gov/data/indices/index.html 
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Prediction method 


MSE 


RMAE (%) 


Wavelet II 


0.063 


0.89 


Splines 


0.065 


0.89 


ARHD 


0.167 


1.25 


ARH(l): linear spline 


0.278 


2.4 


SARIMA 


1.457 


3.72 



Table 1: Mean Squared Error (MSE) and RMAE errors for prediction of El 
Nino index during 1986 

where (resp. X^) denotes the i— th month observation (resp. predic- 
tion). The two-criteria for various functional predictors are given in Table 
[TJ Results show that the best method are Wavelet II (one of the wavelet ap- 
proaches proposed in Antoniadis and Sapatinas (2003)) and spline smooth- 
ing ARH(l). Globally the predictors obtained using ARII(l) model are bet- 
ter and numerically faster than the classical SARIMA (0,1,1) x (1,0, 1) 12 
model (the best SARIMA model based on classical criteria). 

4 Perspectives 

In the precedent sections we insisted on two important statistical problems 
concerning H linear processes. The first discussed in ^2.31 and in relation 
with inference on general or generalized linear processes. The estimation 
with the aim to make predictions with such processes seems to arise difficult 
technical problems. Some new results in this direction are obtained recently 
by Bosq (2006) by introducing the moving average process of order q > 
1, MAH(q). Some partial consistency results for the particular process 
MAH(l) are presented in a paper by Turbillon et al. (2008). A MAII(l) 
is a H valued process satisfying the equation Xt = et + i{et-i) where i is 
a compact operator and (et)) a strong white noise. It is simple from ^ 
to show that this process is invertible when the condition ||^|| < 1. The 
difficulty in estimating £ as for the real valued MA processes stems from 
the fact that the moment equation is not linear conversely to the ARH(l) 
process. Under mild conditions Turbillon et al. (2008) propose two types of 
estimators for £ and give consistency results. 

The second direction concerns the ARH(l) model and his extensions. In 
^3.3.11 we recall some serious theoretical and technical problems with the 
topic of hypothesis testing. But the problem is very important in particular 
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from a practical point of view. As an example let us consider the ARHD 
model and the test addressing the significance of the derivative in the model. 
Another issue may be the characterization of real valued processes allowing 
an ARH representation or more generally linear processes. While some 
examples exists admitting an ARH or MAH representation (see the book by 
Bosq (2000)) a general approach to recognize real processes allowing such a 
representation is an issue for future works. 

The above questions are important from a theoretical point of view, in 
particular for the research in statistics. For the people who analyze data 
that are discretized curves it's more and more necessary to dispose of ana- 
logue description tools as for the ARMA(p,q) real valued processes. In this 
direction a work by Hyndman and Shang (2008) for visualizing functional 
data and identifying functional outliers is an example. 

Acknowledgement. The authors thank Frederic Ferraty, Yves Romain 
and the whole group STAPH for initiating this work as well as for permanent 
and fruitful collaboration and are grateful to Professor Denis Bosq for helpful 
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